To Improve the Robustness of LSTM-RNN Acoustic Models Using Higher-Order Feedback from Multiple Histories
نویسندگان
چکیده
This paper investigates a novel multiple-history long short-term memory (MH-LSTM) RNN acoustic model to mitigate the robustness problem of noisy outputs in the form of mis-labeled data and/or mis-alignments. Conceptually, after an RNN is unfolded in time, the hidden units in each layer are re-arranged into ordered sub-layers with a master sub-layer on top and a set of auxiliary sub-layers below it. Only the master sub-layer generates outputs for the next layer whereas the auxiliary sublayers run in parallel with the master sub-layer but with increasing time lags. Each sub-layer also receives higher-order feedback from a fixed number of sub-layers below it. As a result, each sub-layer maintains a different history of the input speech, and the ensemble of all the different histories lends itself to the model’s robustness. The higher-order connections not only provide shorter feedback paths for error signals to propagate to the farther preceding hidden states to better model the long-term memory, but also more feedback paths to each model parameter and smooth its update during training. Phoneme recognition results on both real TIMIT data as well as synthetic TIMIT data with noisy labels or alignments show that the new model outperforms the conventional LSTM RNN model.
منابع مشابه
Fast and accurate recurrent neural network acoustic models for speech recognition
We have recently shown that deep Long Short-Term Memory (LSTM) recurrent neural networks (RNNs) outperform feed forward deep neural networks (DNNs) as acoustic models for speech recognition. More recently, we have shown that the performance of sequence trained context dependent (CD) hidden Markov model (HMM) acoustic models using such LSTM RNNs can be equaled by sequence trained phone models in...
متن کاملLong short-term memory recurrent neural network architectures for large scale acoustic modeling
Long Short-Term Memory (LSTM) is a specific recurrent neural network (RNN) architecture that was designed to model temporal sequences and their long-range dependencies more accurately than conventional RNNs. In this paper, we explore LSTM RNN architectures for large scale acoustic modeling in speech recognition. We recently showed that LSTM RNNs are more effective than DNNs and conventional RNN...
متن کاملAcoustic Modeling in Statistical Parametric Speech Synthesis – from Hmm to Lstm-rnn
Statistical parametric speech synthesis (SPSS) combines an acoustic model and a vocoder to render speech given a text. Typically decision tree-clustered context-dependent hidden Markov models (HMMs) are employed as the acoustic model, which represent a relationship between linguistic and acoustic features. Recently, artificial neural network-based acoustic models, such as deep neural networks, ...
متن کاملHigh Order Recurrent Neural Networks for Acoustic Modelling
Vanishing long-term gradients are a major issue in training standard recurrent neural networks (RNNs), which can be alleviated by long short-term memory (LSTM) models with memory cells. However, the extra parameters associated with the memory cells mean an LSTM layer has four times as many parameters as an RNN with the same hidden vector size. This paper addresses the vanishing gradient problem...
متن کاملSemi-Supervised Training in Deep Learning Acoustic Model
We studied the semi-supervised training in a fully connected deep neural network (DNN), unfolded recurrent neural network (RNN), and long short-term memory recurrent neural network (LSTM-RNN) with respect to the transcription quality, the importance data sampling, and the training data amount. We found that DNN, unfolded RNN, and LSTM-RNN are increasingly more sensitive to labeling errors. For ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017